Techniques for Accelerating a Grammar-Checker
نویسنده
چکیده
The paper describes several possibilities of using finite-state automata a~s means for speeding up the performance of a grammar-and-parsing-based (as opposed to pattern-matching-based) grammar-checker able to detect errors from a predefined set. The ideas contained have been successflfily implemented in a grammar-checker for Czech, a free-word-order language from the Slavic group. 1 Introduction This paper describes an efficiency-supporting tool for one of the two grarnmar-checker technologies developed in the fi'amework of the PECO2824 Joint Research Project sponsored by the European Union. The project, covering Bulgarian and Czech, two ti'ee-word-order languages from the Slavic t~rnily, was performed between January 1993 and mid 1996 by a consortium consisting of both academic and industrial partners. The basic philosophy of the technology discussed in this paper 1 is that of linguistic-theoretically smmd grammar-and-parsing-based machinery able to detect , by constraint relaxation, errors from a predefi-ned set (as opposed to pattern-matching approaches, which do not seem promising for a free word-order language). The core of the system (broad-coverage HPSG-based grammars of Bulgarian and Czech, and a single language-independent parser) was developed m the first three years of the project and was then passed to the industrial partners Bulgarian Business System IMC Sofia. and Macron Prague, Ltd. While the Bulgarian system remained in more or less a demonstrator stage only, the Czech one satisfied Ma-cron's requirements as to syntactic coverage. However , Macron expressed serious worries about the speed of the system, should this be really introduced to the market. Following this, severa.1 possibili-IAs for the alterna.tive technology, cf. (Hola.n, Kubol't, a.ztd Pl/Ltek, 1997) ties of using finite-state automata (FSA) as means for speeding up the performance of the system were designed, developed and implemented, in particular: • for detecting sentences where none of the prede-fined errors can occur (tiros ruling out such sentences from the procedure of error-search proper) • for detecting which one(s) of tile predefined error types might possibly occur in a particular sentence (hence, cutting clown the search space of the error-search proper) • for detecting errors which are of such a nature that their occurrence might be discovered by a machinery simpler than full-fledged parsing with constraint relaxation • for splitting (certain cases of) complex sentences into independent clauses, a,llowing thus for the error-detection to be performed on short , er strings. Very many of the errors to be discovered by the system can be traced down to mismatches of (vMues …
منابع مشابه
Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation
One problem in statistical machine translation (SMT) is that the output often is ungrammatical. To address this issue, we have investigated the use of a grammar checker for two purposes in connection with SMT: as an evaluation tool and as a postprocessing tool. As an evaluation tool the grammar checker gives a complementary picture to standard metrics such as Bleu, which do not account for gram...
متن کاملCoGrOO: a Brazilian-Portuguese Grammar Checker based on the CETENFOLHA Corpus
This paper describes an ongoing Portuguese Language grammar checker project, called CoGrOO1-Corretor Gramatical para OpenOffice (Grammar Checker for OpenOffice), based on CETENFOLHA, a Brazilian Portuguese morphosyntactic annotated Corpus. Two of its features are highlighted: hybrid architecture, mixing rules and statistics; free software project. This project aims at checking grammatical error...
متن کاملUsing Machine Learning Techniques to Build a Comma Checker for Basque
In this paper, we describe the research using machine learning techniques to build a comma checker to be integrated in a grammar checker for Basque. After several experiments, and trained with a little corpus of 100,000 words, the sys tem guesses correctly not placing com mas with a precision of 96% and a re call of 98%. It also gets a precision of 70% and a recall of 49% in the task of plac...
متن کاملImproving CoGrOO: the Brazilian Portuguese Grammar Checker
This paper highlights the main results obtained in an effort to improve the grammar checker CoGrOO, a hybrid system which initially annotates the text using statistical Natural Language Processing (NLP) techniques, and then apply a rule-based analysis to identify possible grammar errors. The goal was to reduce omissions and false alarms while improving true positives without adding new error ru...
متن کاملA rule-based Afan Oromo Grammar Checker
Natural language processing (NLP) is a subfield of computer science, with strong connections to artificial intelligence. One area of NLP is concerned with creating proofing systems, such as grammar checker. Grammar checker determines the syntactical correctness of a sentence which is mostly used in word processors and compilers. For languages, such as Afan Oromo, advanced tools have been lackin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997